1,262 research outputs found

    Did slavery make Scotia great?

    Get PDF

    Detection of Dispersed Radio Pulses: A machine learning approach to candidate identification and classification

    Get PDF
    Searching for extraterrestrial, transient signals in astronomical data sets is an active area of current research. However, machine learning techniques are lacking in the literature concerning single-pulse detection. This paper presents a new, two-stage approach for identifying and classifying dispersed pulse groups (DPGs) in single-pulse search output. The first stage identified DPGs and extracted features to characterize them using a new peak identification algorithm which tracks sloping tendencies around local maxima in plots of signal-to-noise ratio vs. dispersion measure. The second stage used supervised machine learning to classify DPGs. We created four benchmark data sets: one unbalanced and three balanced versions using three different imbalance treatments.We empirically evaluated 48 classifiers by training and testing binary and multiclass versions of six machine learning algorithms on each of the four benchmark versions. While each classifier had advantages and disadvantages, all classifiers with imbalance treatments had higher recall values than those with unbalanced data, regardless of the machine learning algorithm used. Based on the benchmarking results, we selected a subset of classifiers to classify the full, unlabelled data set of over 1.5 million DPGs identified in 42,405 observations made by the Green Bank Telescope. Overall, the classifiers using a multiclass ensemble tree learner in combination with two oversampling imbalance treatments were the most efficient; they identified additional known pulsars not in the benchmark data set and provided six potential discoveries, with significantly less false positives than the other classifiers.Comment: 13 pages, accepted for publication in MNRAS, ref. MN-15-1713-MJ.R

    CHE 312 - Chemical Process Safety

    Get PDF

    CHE 312-102: Chemical Process Safety

    Get PDF

    Quality Assessment and Prediction in Software Product Lines

    Get PDF
    At the heart of product line development is the assumption that through structured reuse later products will be of a higher quality and require less time and effort to develop and test. This thesis presents empirical results from two case studies aimed at assessing the quality aspect of this claim and exploring fault prediction in the context of software product lines. The first case study examines pre-release faults and change proneness of four products in PolyFlow, a medium-sized, industrial software product line; the second case study analyzes post-release faults using pre-release data over seven releases of four products in Eclipse, a very large, open source software product line.;The goals of our research are (1) to determine the association between various software metrics, as well as their correlation with the number of faults at the component/package level; (2) to characterize the fault and change proneness of components/packages at various levels of reuse; (3) to explore the benefits of the structured reuse found in software product lines; and (4) to evaluate the effectiveness of predictive models, built on a variety of products in a software product line, to make accurate predictions of pre-release software faults (in the case of PolyFlow) and post-release software faults (in the case of Eclipse).;The research results of both studies confirm, in a software product line setting, the findings of others that faults (both pre- and post-release) are more highly correlated to change metrics than to static code metrics, and are mostly contained in a small set of components/ packages. The longitudinal aspect of our research indicates that new products do benefit from the development and testing of previous products. The results also indicate that pre-existing components/packages, including the common components/packages, undergo continuous change, but tend to sustain low fault densities. However, this is not always true for newly developed components/packages. Finally, the results also show that predictions of pre-release faults in the case of PolyFlow and post-release faults in the case of Eclipse can be done accurately from pre-release data, and furthermore, that these predictions benefit from information about additional products in the software product lines

    Using and Interpreting the Bayesian Optimization Algorithm to Improve Early Stage Design of Marine Structures.

    Full text link
    Early stage naval structural design continues to advance as designers seek to improve the quality and speed of the design process. The early stages of design produce preliminary dimensions or scantlings which control the cost and structural performance of a vessel. Increased complexity in the evaluation of structural response has led to a need for efficient algorithms well suited to solving structural design specific optimization problems. As problem sizes increase, existing optimizers can become slow or inaccurate. The Bayesian Optimization Algorithm (BOA) is presented as one solution to efficiently solve problems in the structural design optimization process. The Bayesian optimization algorithm is an Estimation of Distribution Algorithm (EDA) that uses a statistical sample of potential design solutions to create and train a Bayesian network (BN). The application of BNs is well suited for nearly decomposable problem composition which closely matches rules based structural design evaluation. This makes the BOA well suited to solve complex early stage structural optimization problems. Additionally, the learning processes used to create and train the BNs can be analyzed and interpreted to capture design knowledge. This return of knowledge to the designer helps to improve designer intuition and model synthesis in the face of more complex and intricate models. The BNs are thus analyzed to augment design problem understanding and explore trade-offs within the design space. The result matches a paradigm shift in early stage optimization of naval structures. Designers gain better understanding of critical design variables and their interactions as compared to the previous focus on the single most optimal solution. This leads to efficient simulations which rapidly explore design spaces, document critical design variable relationships and enable the designer to create better early stage design solutions.PhDNaval Architecture and Marine EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133317/1/tedevine_1.pd

    Searching for Needles in the Cosmic Haystack

    Get PDF
    Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best practice. This dissertation is focused on identifying and classifying radio pulsar candidates from single pulse searches. First, to identify and classify Dispersed Pulse Groups (DPGs), we developed a supervised machine learning approach that consists of RAPID (a novel peak identification algorithm), feature extraction, and supervised machine learning classification. We tested six algorithms for classification with four imbalance treatments. Results showed that classifiers with imbalance treatments had higher recall values. Overall, classifiers using multiclass RandomForests combined with Synthetic Majority Oversampling TEchnique (SMOTE) were the most efficient; they identified additional known pulsars not in the benchmark, with less false positives than other classifiers. Second, we developed a parallel single pulse identification method, D-RAPID, and introduced a novel automated multiclass labeling (ALM) technique that we combined with feature selection to improve execution performance. D-RAPID improved execution performance over RAPID by a factor of 5. We also showed that the combination of ALM and feature selection sped up the execution performance of RandomForest by 54% on average with less than a 2% average reduction in classification performance. Finally, we proposed CoDRIFt, a novel classification algorithm that is distributed for scalability and employs semi-supervised learning to leverage unlabeled data to inform classification. We evaluated and compared CoDRIFt to eleven other classifiers. The results showed that CoDRIFt excelled at classifying candidates in imbalanced benchmarks with a majority of non-pulsar signals (\u3e95%). Furthermore, CoDRIFt models created with very limited sets of labeled data (as few as 22 labeled minority class instances) were able to achieve high recall (mean = 0.98). In comparison to the other algorithms trained on similar sets, CoDRIFt outperformed them all, with recall 2.9% higher than the next best classifier and a 35% average improvement over all eleven classifiers. CoDRIFt is customizable for other problem domains with very large, imbalanced data sets, such as fraud detection and cyber attack detection

    Influence of Steroids and Gonadotropins on Reproduction in Beef Cattle

    Get PDF
    The objective of this thesis was to evaluate the influence of steroids and gonadotropins on reproduction in beef cattle. In experiment 1, beef heifers were used to determine the influence of growth-promoting implants on growth, reproductive development, estrous behavior, and pregnancy rate. Heifers were assigned to 1 of 4 implant treatment groups: control (CON); trenbolone acetate (TBA); trenbolone acetate plus estradiol (TBA+E2) or zeranol (ZER). Heifers were implanted once, A.I. and exposed to bull during this experiment. Body weight, BCS, HH, RTS, estrous behavior and pregnancy data were collected throughout this experiment. Average daily gain of heifers was greater for TBA+E2 heifers. Fewer heifers treated with ZER were classified with a cyclic RTS on d 106 than CON and TBA treated heifers while heifers treated with TBA+E2 were similar to all treatments. Heifers treated with TBA had increased mounts during estrus compared with all other treatments. Overall and A.I pregnancy rates did not differ among treatments.In experiment 2, superovulated beef donors were utilized to determine the feasibility of performing a cow-side LH assay (PrediBov®) on superovulated donors, with emphasis on determining how to use the results in a commercial program. Donors were subjected to superstimulation; blood samples were collected starting at CIDR removal, continuing every 6 h until a positive test was acquired or 36 h after CIDR removal. Whole blood (0.5 mL) was submitted to the assay and donors were inseminated approximately 12 and 24 h after a positive test or onset of estrus. The majority of positive LH tests occurred within 12 to 24 h after CIDR removal. Forty-four percent of the positive tests occurred 0 to 6 h after the onset of estrus. Donors that were inseminated 6 to 10 h after a positive LH test produced more viable and grade 1 embryos than donors inseminated either \u3c 6 or 10 to 14 h after a positive test. There were no differences in embryo production between insemination times from the onset of estrus or between donors inseminated approximately 12 and 24 h after a positive test or the onset of estrus
    corecore